Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IntegrateGQ.sh errors due to presence of variants of just one type #760

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

kjaisingh
Copy link
Collaborator

@kjaisingh kjaisingh commented Dec 9, 2024

This PR addresses Issue #759.

Description

Addresses an issue in the IntegrateGQ.sh script where the combination of set -o pipefail and { fgrep ... || true; } patterns could cause pipelines to fail prematurely if no matches were found. By leveraging awk with ARGIND, the script no longer fails silently and continues processing regardless of the SVTYPE makeup of the shard.

Testing

Test 1: Changes circumvent WDL failure.

  • This Terra job demonstrates a run with the previous version of IntegrateGQ.sh, wherein each variant in its depth & PESR VCFs are of type - note that the run fails.
  • This Terra job demonstrates the identical run but with the updated version of IntegrateGQ.sh - note that the run completes successfully.

Test 2: Changes do not change the existing logic.

  • This Terra job demonstrates a run with the previous version of IntegrateGQ.sh, including all types of variants.
  • This Terra job demonstrates the identical run but with the updated version of IntegrateGQ.sh - note that the output depth & PESR VCFs are identical.

Test 3: Changes produce expected results when using DEL-only inputs.
Conducted local tests where I used the IntegrateGQ.sh inputs for a run that contained all variants, then removed all non-DEL records from this. After running IntegrateGQ.sh with these DEL-only inputs, the outputs were identical for the matched counterparts in the case where the inputs contained all variants.
Note: If it's preferred to reflect this in Terra, let me know - I did it locally for the time being as there is no dedicated WDL file to test IntegrateGQ in isolation, though I could make one.

Test 4: Changes produce expected results when using only non-DEL inputs.
Same as Test 3, but this time removed all DEL records from the inputs. The outputs were identical for the matched counterparts in the case where the inputs contained all variants.

@kjaisingh kjaisingh added the bug Something isn't working label Dec 9, 2024
@kjaisingh kjaisingh self-assigned this Dec 9, 2024
@kjaisingh kjaisingh linked an issue Dec 9, 2024 that may be closed by this pull request
1 task
@kjaisingh kjaisingh requested a review from mwalker174 January 24, 2025 23:49
@kjaisingh kjaisingh marked this pull request as ready for review January 24, 2025 23:52
@@ -30,41 +30,57 @@ zcat $RD_melted_genotypes \
|gzip \
>rd_indiv_geno.txt.gz

##Deletions, need to PE-SR genotypes to match RD format (2==ref)##
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should keep this comment in and the one about duplications below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

@kjaisingh
Copy link
Collaborator Author

@mwalker174 Going to mark this as a draft PR until I've got more clarity into why some of the genotypes are being changed. I did take a quick look already, and it seems like the specific outputs from IntegrateGQ.sh (genotype.variant.txt.gz and genotype.indiv.txt.gz) are identical between the two runs, so need to better understand the other genotyping tasks first in order to gauge why some genotypes are changing.

@kjaisingh kjaisingh marked this pull request as draft January 28, 2025 20:17
@kjaisingh kjaisingh marked this pull request as ready for review January 30, 2025 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IntegrateGQ.sh error trapping doesn't work, causing pipe failures
2 participants